Goto

Collaborating Authors

 ranking problem



Learning List-Level Domain-Invariant Representations for Ranking

Neural Information Processing Systems

Domain adaptation aims to transfer the knowledge learned on (data-rich) source domains to (low-resource) target domains, and a popular method is invariant representation learning, which matches and aligns the data distributions on the feature space. Although this method is studied extensively and applied on classification and regression problems, its adoption on ranking problems is sporadic, and the few existing implementations lack theoretical justifications. This paper revisits invariant representation learning for ranking. Upon reviewing prior work, we found that they implement what we call item-level alignment, which aligns the distributions of the items being ranked from all lists in aggregate but ignores their list structure. However, the list structure should be leveraged, because it is intrinsic to ranking problems where the data and the metrics are defined and computed on lists, not the items by themselves. To close this discrepancy, we propose list-level alignment--learning domain-invariant representations at the higher level of lists. The benefits are twofold: it leads to the first domain adaptation generalization bound for ranking, in turn providing theoretical support for the proposed method, and it achieves better empirical transfer performance for unsupervised domain adaptation on ranking tasks, including passage reranking.


Ranking Data with Continuous Labels through Oriented Recursive Partitions

Neural Information Processing Systems

We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space X and the goal is to order all possible observations x in X by means of a scoring function s: X R so that s(X) and Y tend to increase or decrease together with highest probability. This problem generalizes bi/multi-partite ranking to a certain extent and the task of finding optimal scoring functions s( x) can be naturally cast as optimization of a dedicated functional criterion, called the IROC curve here, or as maximization of the Kendall τ related to the pair (s(X),Y). From the theoretical side, we describe the optimal elements of this problem and provide statistical guarantees for empirical Kendall τ maximization under appropriate conditions for the class of scoring function candidates. We also propose a recursive statistical learning algorithm tailored to empirical IROC curve optimization and producing a piecewise constant scoring function that is fully described by an oriented binary tree. Preliminary numerical experiments highlight the difference in nature between regression and continuous ranking and provide strong empirical evidence of the performance of empirical optimizers of the criteria proposed.



A theoretical guarantee for SyncRank

Rao, Yang

arXiv.org Machine Learning

The statistical ranking problem--inferring a global ordering of items from incomplete and noisy pairwise comparisons--emerges naturally in competitive sports analysis, preference aggregation, and economic exchange systems. Traditional approaches rooted in social choice theory often falter when confronted with modern datasets characterized by two pervasive challenges: (1) the comparisons are sparsely observed, with measurement graphs far from complete; (2) the noise exhibits strong heterogeneity, where certain subsets of comparisons may be significantly more reliable than others. These limitations are particularly evident in applications like soccer league standings analysis, where the outcome matrix contains both structured noise (e.g., home advantage biases) and random outliers. Recent advances in group synchronization theory provide a novel geometric perspective for this classical problem. By mapping player ranks to phases on the unit circle and rank differences to angular offsets, the ranking task becomes equivalent to solving an instance of the angular synchronization problem over SO(2). This reformulation inherits key theoretical guarantees from the synchronization literature: spectral and semidefinite programming (SDP) relaxations can provably recover the underlying ranking under quantifiable noise thresholds. Crucially, the circular representation inherently handles cyclic inconsistencies through phase wrapping, circumventing the need for explicit outlier removal mechanisms required by linear embedding approaches.


Assumption-free stability for ranking problems

Liang, Ruiting, Soloff, Jake A., Barber, Rina Foygel, Willett, Rebecca

arXiv.org Machine Learning

In this work, we consider ranking problems among a finite set of candidates: for instance, selecting the top-$k$ items among a larger list of candidates or obtaining the full ranking of all items in the set. These problems are often unstable, in the sense that estimating a ranking from noisy data can exhibit high sensitivity to small perturbations. Concretely, if we use data to provide a score for each item (say, by aggregating preference data over a sample of users), then for two items with similar scores, small fluctuations in the data can alter the relative ranking of those items. Many existing theoretical results for ranking problems assume a separation condition to avoid this challenge, but real-world data often contains items whose scores are approximately tied, limiting the applicability of existing theory. To address this gap, we develop a new algorithmic stability framework for ranking problems, and propose two novel ranking operators for achieving stable ranking: the \emph{inflated top-$k$} for the top-$k$ selection problem and the \emph{inflated full ranking} for ranking the full list. To enable stability, each method allows for expressing some uncertainty in the output. For both of these two problems, our proposed methods provide guaranteed stability, with no assumptions on data distributions and no dependence on the total number of candidates to be ranked. Experiments on real-world data confirm that the proposed methods offer stability without compromising the informativeness of the output.


SerialRank: Spectral Ranking using Seriation

Fajwel Fogel, Alexandre d'Aspremont, Milan Vojnovic

Neural Information Processing Systems

We describe a seriation algorithm for ranking a set of n items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation algorithm recovers the true ranking when all pairwise comparisons are observed and consistent with a total order. We then show that ranking reconstruction is still exact even when some pairwise comparisons are corrupted or missing, and that seriation based spectral ranking is more robust to noise than other scoring methods. An additional benefit of the seriation formulation is that it allows us to solve semi-supervised ranking problems. Experiments on both synthetic and real datasets demonstrate that seriation based spectral ranking achieves competitive and in some cases superior performance compared to classical ranking methods.


Learning List-Level Domain-Invariant Representations for Ranking

Neural Information Processing Systems

Domain adaptation aims to transfer the knowledge learned on (data-rich) source domains to (low-resource) target domains, and a popular method is invariant representation learning, which matches and aligns the data distributions on the feature space. Although this method is studied extensively and applied on classification and regression problems, its adoption on ranking problems is sporadic, and the few existing implementations lack theoretical justifications. This paper revisits invariant representation learning for ranking. Upon reviewing prior work, we found that they implement what we call item-level alignment, which aligns the distributions of the items being ranked from all lists in aggregate but ignores their list structure. However, the list structure should be leveraged, because it is intrinsic to ranking problems where the data and the metrics are defined and computed on lists, not the items by themselves.


Minimum Weighted Feedback Arc Sets for Ranking from Pairwise Comparisons

Vahidi, Soroush, Koutis, Ioannis

arXiv.org Artificial Intelligence

The Minimum Weighted Feedback Arc Set (MWFAS) problem is fundamentally connected to the Ranking Problem -- the task of deriving global rankings from pairwise comparisons. Recent work [He et al. ICML2022] has advanced the state-of-the-art for the Ranking Problem using learning-based methods, improving upon multiple previous approaches. However, the connection to MWFAS remains underexplored. This paper investigates this relationship and presents efficient combinatorial algorithms for solving MWFAS, thus addressing the Ranking Problem. Our experimental results demonstrate that these simple, learning-free algorithms not only significantly outperform learning-based methods in terms of speed but also generally achieve superior ranking accuracy.


Reviews: TopRank: A practical algorithm for online stochastic ranking

Neural Information Processing Systems

This paper tackles the problem of learning to rank items in the online setting. This paper proposes a new algorithm that uses confidence intervals on items preferences ordering in order to decide which item to assign to each position. Results show that the proposed approach empirically outperforms the current state-of-the-art in the re-ranking setting. The paper also provides an upper bound on the regret for the proposed approach, and a lower bound on the regret in ranking problems. Quality: I found the paper of overall good quality.